home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-06-29 | 47.1 KB | 1,123 lines |
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- NNAAMMEE
- ispell, buildhash, munchlist, findaffix, tryaffix, icom-
- bine, ijoin - Interactive spelling checking
-
- SSYYNNOOPPSSIISS
- iissppeellll [_c_o_m_m_o_n_-_f_l_a_g_s] [--MM|--NN] [--LL_c_o_n_t_e_x_t]] [--VV] files
- iissppeellll [_c_o_m_m_o_n_-_f_l_a_g_s] --ll
- iissppeellll [_c_o_m_m_o_n_-_f_l_a_g_s] [--ff file] [--ss] {--aa|--AA}
- iissppeellll [--dd _f_i_l_e] [--ww _c_h_a_r_s] --cc
- iissppeellll [--dd _f_i_l_e] [--ww _c_h_a_r_s] --ee[ee]
- iissppeellll [--dd _f_i_l_e] --DD
- iissppeellll --vv[vv]
-
- _c_o_m_m_o_n_-_f_l_a_g_s:
- [--tt] [--nn] [--bb] [--xx] [--BB] [--CC] [--PP] [--mm] [--SS] [--dd
- _f_i_l_e] [--pp _f_i_l_e] [--ww _c_h_a_r_s] [--WW _n] [--TT _t_y_p_e]
-
- bbuuiillddhhaasshh [--ss] _d_i_c_t_-_f_i_l_e _a_f_f_i_x_-_f_i_l_e _h_a_s_h_-_f_i_l_e
- bbuuiillddhhaasshh --ss _c_o_u_n_t _a_f_f_i_x_-_f_i_l_e
-
- mmuunncchhlliisstt [--ll _a_f_f_-_f_i_l_e] [--cc _c_o_n_v_-_f_i_l_e] [--TT _s_u_f_f_i_x]
- [--ss _h_a_s_h_-_f_i_l_e] [--DD] [--vv] [--ww _c_h_a_r_s] [_f_i_l_e_s]
-
- ffiinnddaaffffiixx [--pp|--ss] [--ff] [--cc] [--mm _m_i_n] [--MM _m_a_x] [--ee _e_l_i_m]
- [--tt _t_a_b_c_h_a_r] [--ll _l_o_w] [_f_i_l_e_s]
-
- ttrryyaaffffiixx [--pp|--ss]] [--cc] _e_x_p_a_n_d_e_d_-_f_i_l_e _a_f_f_i_x[_+_a_d_d_i_t_i_o_n]
-
- iiccoommbbiinnee [--TT _t_y_p_e] [_a_f_f_-_f_i_l_e]
-
- iijjooiinn [--ss|--uu] _j_o_i_n_-_o_p_t_i_o_n_s _f_i_l_e_1 _f_i_l_e_2
-
- DDEESSCCRRIIPPTTIIOONN
- _I_s_p_e_l_l is fashioned after the _s_p_e_l_l program from ITS
- (called _i_s_p_e_l_l on Twenex systems.) The most common usage
- is "ispell filename". In this case, _i_s_p_e_l_l will display
- each word which does not appear in the dictionary at the
- top of the screen and allow you to change it. If there
- are "near misses" in the dictionary (words which differ by
- only a single letter, a missing or extra letter, a pair of
- transposed letters, or a missing space or hyphen), then
- they are also displayed on following lines. As well as
- "near misses", ispell may display other guesses at ways to
- make the word from a known root, with each guess preceded
- by question marks. Finally, the line containing the word
- and the previous line are printed at the bottom of the
- screen. If your terminal can display in reverse video,
- the word itself is highlighted. You have the option of
- replacing the word completely, or choosing one of the sug-
- gested words. Commands are single characters as follows
- (case is ignored):
-
-
- R Replace the misspelled word completely.
-
-
-
- local 1
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- Space Accept the word this time only.
-
- A Accept the word for the rest of this _i_s_p_e_l_l
- session.
-
- I Accept the word, capitalized as it is in the
- file, and update private dictionary.
-
- U Accept the word, and add an uncapitalized
- (actually, all lower-case) version to the
- private dictionary.
-
- 0-_n Replace with one of the suggested words.
-
- L Look up words in system dictionary (con-
- trolled by the WORDS compilation option).
-
- X Write the rest of this file, ignoring mis-
- spellings, and start next file.
-
- Q Exit immediately and leave the file
- unchanged.
-
- ! Shell escape.
-
- ^L Redraw screen.
-
- ^Z Suspend ispell.
-
- ? Give help screen.
-
- If the --MM switch is specified, a one-line mini-menu at the
- bottom of the screen will summarize these options. Con-
- versely, the --NN switch may be used to suppress the mini-
- menu. (The minimenu is displayed by default if _i_s_p_e_l_l was
- compiled with the MINIMENU option, but these two switches
- will always override the default).
-
- If the --LL flag is given, the specified number is used as
- the number of lines of context to be shown at the bottom
- of the screen (The default is to calculate the amount of
- context as a certain percentage of the screen size). The
- amount of context is subject to a system-imposed limit.
-
- If the --VV flag is given, characters that are not in the
- 7-bit ANSI printable character set will always be dis-
- played in the style of "cat -v", even if _i_s_p_e_l_l thinks
- that these characters are legal ISO Latin-1 on your sys-
- tem. This is useful when working with older terminals.
- Without this switch, _i_s_p_e_l_l will display 8-bit characters
- "as is" if they have been defined as string characters for
- the chosen file type.
-
- "Normal" mode, as well as the --ll, --aa, and --AA options (see
-
-
-
- local 2
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- below) also accepts the following "common" flags on the
- command line:
-
- --tt The input file is in TeX or LaTeX format.
-
- --nn The input file is in nroff/troff format.
-
- --bb Create a backup file by appending ".bak" to
- the name of the input file.
-
- --xx Don't create a backup file.
-
- --BB Report run-together words with missing
- blanks as spelling errors.
-
- --CC Consider run-together words as legal com-
- pounds.
-
- --PP Don't generate extra root/affix combina-
- tions.
-
- --mm Make possible root/affix combinations that
- aren't in the dictionary.
-
- --SS Sort the list of guesses by probable cor-
- rectness.
-
- --dd file
- Specify an alternate dictionary file. For
- example, use --dd ddeeuuttsscchh to choose a German
- dictionary in a German installation.
-
- --pp file
- Specify an alternate personal dictionary.
-
- --ww chars
- Specify additional characters that can be
- part of a word.
-
- --WW n Specify length of words that are always
- legal.
-
- --TT type
- Assume a given formatter type for all files.
-
- The --nn and --tt options select whether _i_s_p_e_l_l runs in
- nroff/troff (--nn) or TeX/LaTeX (--tt) input mode. (The
- default is controlled by the DEFTEXFLAG installation
- option.) TeX/LaTeX mode is also automatically selected if
- an input file has the extension ".tex", unless overridden
- by the --nn switch. In TeX/LaTeX mode, whenever a backslash
- ("\") is found, _i_s_p_e_l_l will skip to the next whitespace or
- TeX/LaTeX delimiter. Certain commands contain arguments
- which should not be checked, such as labels and reference
-
-
-
- local 3
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- keys as are found in the \cite command, since they contain
- arbitrary, non-word arguments. Spell checking is also
- suppressed when in math mode. Thus, for example, given
-
- \chapter {This is a Ckapter} \cite{SCH86}
-
- _i_s_p_e_l_l will find "Ckapter" but not "SCH". The --tt option
- does not recognize the TeX comment character "%", so com-
- ments are also spell-checked. It also assumes correct
- LaTeX syntax. Arguments to infrequently used commands and
- some optional arguments are sometimes checked unnecessar-
- ily. The bibliography will not be checked if _i_s_p_e_l_l was
- compiled with IIGGNNOORREEBBIIBB defined. Otherwise, the bibliog-
- raphy will be checked but the reference key will not.
-
- References for the _t_i_b(1) bibliography system, that is,
- text between a ``[.'' or ``<.'' and ``.]'' or ``.>'' will
- always be ignored in TeX/LaTeX mode.
-
- The --bb and --xx options control whether _i_s_p_e_l_l leaves a
- backup (.bak) file for each input file. The .bak file
- contains the pre-corrected text. If there are file open-
- ing / writing errors, the .bak file may be left for recov-
- ery purposes even with the --xx option. The default for
- this option is controlled by the DEFNOBACKUPFLAG installa-
- tion option.
-
- The --BB and --CC options control how _i_s_p_e_l_l handles run-
- together words, such as "notthe" for "not the". If --BB is
- specified, such words will be considered as errors, and
- _i_s_p_e_l_l will list variations with an inserted blank or
- hyphen as possible replacements. If --CC is specified, run-
- together words will be considered to be legal compounds,
- so long as both components are in the dictionary, and each
- component is at least as long as a language-dependent min-
- imum (3 characters, by default). This is useful for lan-
- guages such as German and Norwegian, where many compound
- words are formed by concatenation. (Note that compounds
- formed from three or more root words will still be consid-
- ered errors). The default for this option is language-
- dependent; in a multi-lingual installation the default may
- vary depending on which dictionary you choose.
-
- The --PP and --mm options control when _i_s_p_e_l_l automatically
- generates suggested root/affix combinations for possible
- addition to your personal dictionary. (These are the
- entries in the "guess" list which are preceded by question
- marks.) If --PP is specified, such guesses are displayed
- only if _i_s_p_e_l_l cannot generate any possibilities that
- match the current dictionary. If --mm is specified, such
- guesses are always displayed. This can be useful if the
- dictionary has a limited word list, or a word list with
- few suffixes. However, you should be careful when using
- this option, as it can generate guesses that produce
-
-
-
- local 4
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- illegal words. The default for this option is controlled
- by the dictionary file used.
-
- The --SS option suppresses _i_s_p_e_l_l's normal behavior of sort-
- ing the list of possible replacement words. Some people
- may prefer this, since it somewhat enhances the probabil-
- ity that the correct word will be low-numbered.
-
- The --dd option is used to specify an alternate hashed dic-
- tionary file, other than the default. If the filename
- does not contain a "/", the library directory for the
- default dictionary file is prefixed; thus, to use a dic-
- tionary in the local directory "-d ./xxx.hash" must be
- used. This is useful to allow dictionaries for alternate
- languages. Unlike previous versions of _i_s_p_e_l_l, a dictio-
- nary of _/_d_e_v_/_n_u_l_l is illegal, because the dictionary con-
- tains the affix table. If you need an effectively empty
- dictionary, create a one-entry list with an unlikely
- string (e.g., "qqqqq").
-
- The --pp option is used to specify an alternate personal
- dictionary file. If the file name does not begin with
- "/", $HOME is prefixed. Also, the shell variable WORDLIST
- may be set, which renames the personal dictionary in the
- same manner. The command line overrides any WORDLIST set-
- ting. If neither the --pp switch nor the WORDLIST environ-
- ment variable is given, _i_s_p_e_l_l will search for a personal
- dictionary in both the current directory and $HOME, creat-
- ing one in $HOME if none is found. The preferred name is
- constructed by appending ".ispell_" to the base name of
- the hash file. For example, if you use the English dic-
- tionary, your personal dictionary would be named
- ".ispell_english". However, if the file ".ispell_words"
- exists, it will be used as the personal dictionary regard-
- less of the language hash file chosen. This feature is
- included primarily for backwards compatibility.
-
- If the --pp option is _n_o_t specified, _i_s_p_e_l_l will look for
- personal dictionaries in both the current directory and
- the home directory. If dictionaries exist in both places,
- they will be merged. If any words are added to the per-
- sonal dictionary, they will be written to the current
- directory if a dictionary already existed in that place;
- otherwise they will be written to the dictionary in the
- home directory.
-
- The --ww option may be used to specify characters other than
- alphabetics which may also appear in words. For instance,
- --ww "&" will allow "AT&T" to be picked up. Underscores are
- useful in many technical documents. There is an admit-
- tedly crude provision in this option for 8-bit interna-
- tional characters. Non-printing characters may be speci-
- fied in the usual way by inserting a backslash followed by
- the octal character code; e.g., "\014" for a form feed.
-
-
-
- local 5
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- Alternatively, if "n" appears in the character string, the
- (up to) three characters following are a DECIMAL code 0 -
- 255, for the character. For example, to include bells and
- form feeds in your words (an admittedly silly thing to do,
- but aren't most pedagogical examples):
-
- n007n012
-
- Numeric digits other than the three following "n" are sim-
- ply numeric characters. Use of "n" does not conflict with
- anything because actual alphabetics have no meaning -
- alphabetics are already accepted. _I_s_p_e_l_l will typically
- be used with input from a file, meaning that preserving
- parity for possible 8 bit characters from the input text
- is OK. If you specify the -l option, and actually type
- text from the terminal, this may create problems if your
- stty settings preserve parity.
-
- The --WW option may be used to change the length of words
- that _i_s_p_e_l_l always accepts as legal. Normally, _i_s_p_e_l_l
- will accept all 1-character words as legal, which is
- equivalent to specifying "--WW 11." (The default for this
- switch is actually controlled by the MINWORD installation
- option, so it may vary at your installation.) If you want
- all words to be checked against the dictionary, regardless
- of length, you might want to specify "--WW 00." On the other
- hand, if your document specifies a lot of three-letter
- acronyms, you would specify "--WW 33" to accept all words of
- three letters or less. Regardless of the setting of this
- option, _i_s_p_e_l_l will only generate words that are in the
- dictionary as suggested replacements for words; this pre-
- vents the list from becoming too long. Obviously, this
- option can be very dangerous, since short misspellings may
- be missed. If you use this option a lot, you should prob-
- ably make a last pass without it before you publish your
- document, to protect yourself against errors.
-
- The --TT option is used to specify a default formatter type
- for use in generating string characters. This switch
- overrides the default type determined from the file name.
- The _t_y_p_e argument may be either one of the unique names
- defined in the language affix file (e.g., nnrrooffff) or a file
- suffix including the dot (e.g., ..tteexx). If no --TT option
- appears and no type can be determined from the file name,
- the default string character type declared in the language
- affix file will be used.
-
- The --ll or "list" option to _i_s_p_e_l_l is used to produce a
- list of misspelled words from the standard input.
-
- The --aa option is intended to be used from other programs
- through a pipe. In this mode, _i_s_p_e_l_l prints a one-line
- version identification message, and then begins reading
- lines of input. For each input line, a single line is
-
-
-
- local 6
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- written to the standard output for each word checked for
- spelling on the line. If the word was found in the main
- dictionary, or your personal dictionary, then the line
- contains only a '*'. If the word was found through affix
- removal, then the line contains a '+', a space, and the
- root word. If the word was found through compound forma-
- tion (concatenation of two words, controlled by the --CC
- option), then the line contains only a '-'.
-
- If the word is not in the dictionary, but there are near
- misses, then the line contains an '&', a space, the mis-
- spelled word, a space, the number of near misses, the num-
- ber of characters between the beginning of the line and
- the beginning of the misspelled word, a colon, another
- space, and a list of the near misses separated by commas
- and spaces. Following the near misses (and identified
- only by the count of near misses), if the word could be
- formed by adding (illegal) affixes to a known root, is a
- list of suggested derivations, again separated by commas
- and spaces. If there are no near misses at all, the line
- format is the same, except that the '&' is replaced by '?'
- (and the near-miss count is always zero). The suggested
- derivations following the near misses are in the form:
-
- [prefix+] root [-prefix] [-suffix] [+suffix]
-
- (e.g., "re+fry-y+ies" to get "refries") where each
- optional _p_f_x and _s_f_x is a string. Also, each near miss or
- guess is capitalized the same as the input word unless
- such capitalization is illegal; in the latter case each
- near miss is capitalized correctly according to the dic-
- tionary.
-
- Finally, if the word does not appear in the dictionary,
- and there are no near misses, then the line contains a
- '#', a space, the misspelled word, a space, and the char-
- acter offset from the beginning of the line. Each sen-
- tence of text input is terminated with an additional blank
- line, indicating that _i_s_p_e_l_l has completed processing the
- input line.
-
- These output lines can be summarized as follows:
-
-
- OK: *
-
- Root: + <root>
-
- Compound:
- -
-
- Miss: & <original> <count> <offset>: <miss>,
- <miss>, ..., <guess>, ...
-
-
-
-
- local 7
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- Guess: ? <original> 0 <offset>: <guess>, <guess>,
- ...
-
- None: # <original> <offset>
-
- For example, a dummy dictionary containing the words
- "fray", "Frey", "fry", and "refried" might produce the
- following response to the command "echo 'frqy refries |
- ispell -a -m -d ./test.hash":
- (#) International Ispell Version 3.0.05 (beta), 08/10/91
- & frqy 3 0: fray, Frey, fry
- & refries 1 5: refried, re+fry-y+ies
-
- This mode is also suitable for interactive use when you
- want to figure out the spelling of a single word.
-
- The --AA option works just like --aa, except that if a line
- begins with the string "&Include_File&", the rest of the
- line is taken as the name of a file to read for further
- words. Input returns to the original file when the
- include file is exhausted. Inclusion may be nested up to
- five deep. The key string may be changed with the envi-
- ronment variable IINNCCLLUUDDEE__SSTTRRIINNGG (the ampersands, if any,
- must be included).
-
- When in the --aa mode, _i_s_p_e_l_l will also accept lines of sin-
- gle words prefixed with any of '*', '&', '@', '+', '-',
- '~', '#', '!', '%', or '^'. A line starting with '*'
- tells _i_s_p_e_l_l to insert the word into the user's dictionary
- (similar to the I command). A line starting with '&'
- tells _i_s_p_e_l_l to insert an all-lowercase version of the
- word into the user's dictionary (similar to the U com-
- mand). A line starting with '@' causes _i_s_p_e_l_l to accept
- this word in the future (similar to the A command). A
- line starting with '+', followed immediately by tteexx or
- nnrrooffff will cause _i_s_p_e_l_l to parse future input according
- the syntax of that formatter. A line consisting solely of
- a '+' will place _i_s_p_e_l_l in TeX/LaTeX mode (similar to the
- --tt option) and '-' returns _i_s_p_e_l_l to nroff/troff mode (but
- these commands are obsolete). However, string character
- type is _n_o_t changed; the '~' command must be used to do
- this. A line starting with '~' causes _i_s_p_e_l_l to set
- internal parameters (in particular, the default string
- character type) based on the filename given in the rest of
- the line. (A file suffix is sufficient, but the period
- must be included. Instead of a file name or suffix, a
- unique name, as listed in the language affix file, may be
- specified.) However, the formatter parsing is _n_o_t
- changed; the '+' command must be used to change the for-
- matter. A line prefixed with '#' will cause the personal
- dictionary to be saved. A line prefixed with '!' will
- turn on _t_e_r_s_e mode (see below), and a line prefixed with
- '%' will return _i_s_p_e_l_l to normal (non-terse) mode. Any
- input following the prefix characters '+', '-', '#', '!',
-
-
-
- local 8
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- or '%' is ignored, as is any input following the filename
- on a '~' line. To allow spell-checking of lines beginning
- with these characters, a line starting with '^' has that
- character removed before it is passed to the spell-
- checking code. It is recommended that programmatic inter-
- faces prefix every data line with an uparrow to protect
- themselves against future changes in _i_s_p_e_l_l.
-
- To summarize these:
-
-
- * Add to personal dictionary
-
- @ Accept word, but leave out of dictionary
-
- # Save current personal dictionary
-
- ~ Set parameters based on filename
-
- + Enter TeX mode
-
- - Exit TeX mode
-
- ! Enter terse mode
-
- % Exit terse mode
-
- ^ Spell-check rest of line
-
- In _t_e_r_s_e mode, _i_s_p_e_l_l will not print lines beginning with
- '*', '+', or '-', all of which indicate correct words.
- This significantly improves running speed when the driving
- program is going to ignore correct words anyway.
-
- The --ss option is only valid in conjunction with the --aa or
- --AA options, and only on BSD-derived systems. If speci-
- fied, _i_s_p_e_l_l will stop itself with a SSIIGGTTSSTTPP signal after
- each line of input. It will not read more input until it
- receives a SSIIGGCCOONNTT signal. This may be useful for hand-
- shaking with certain text editors.
-
- The --ff option is only valid in conjunction with the --aa or
- --AA options. If --ff is specified, _i_s_p_e_l_l will write its
- results to the given file, rather than to standard output.
-
- The --vv option causes _i_s_p_e_l_l to print its current version
- identification on the standard output and exit. If the
- switch is doubled, _i_s_p_e_l_l will also print the options that
- it was compiled with.
-
- The --cc, --ee[11--44], and --DD options of _i_s_p_e_l_l, are primarily
- intended for use by the _m_u_n_c_h_l_i_s_t shell script. The --cc
- switch causes a list of words to be read from the standard
- input. For each word, a list of possible root words and
-
-
-
- local 9
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- affixes will be written to the standard output. Some of
- the root words will be illegal and must be filtered from
- the output by other means; the _m_u_n_c_h_l_i_s_t script does this.
- As an example, the command:
-
- echo BOTHER | ispell -c
-
- produces:
-
- BOTHER BOTHE/R BOTH/R
-
- The --ee switch is the reverse of --cc; it expands affix flags
- to produce a list of words. For example, the command:
-
- echo BOTH/R | ispell -e
-
- produces:
-
- BOTH BOTHER
-
- An optional expansion level can also be specified. A
- level of 1 (--ee11) is the same as --ee alone. A level of 2
- causes the original root/affix combination to be prepended
- to the line:
-
- BOTH/R BOTH BOTHER
-
- A level of 3 causes multiple lines to be output, one for
- each generated word, with the original root/affix combina-
- tion followed by the word it creates:
-
- BOTH/R BOTH
- BOTH/R BOTHER
-
- A level of 4 causes a floating-point number to be appended
- to each of the level-3 lines, giving the ratio between the
- length of the root and the total length of all generated
- words including the root:
-
- BOTH/R BOTH 2.500000
- BOTH/R BOTHER 2.500000
-
- Finally, the --DD flag causes the affix tables from the dic-
- tionary file to be dumped to standard output.
-
- Unless your system administrator has suppressed the fea-
- ture to save space, _i_s_p_e_l_l is aware of the correct capi-
- talizations of words in the dictionary and in your per-
- sonal dictionary. As well as recognizing words that must
- be capitalized (e.g., George) and words that must be all-
- capitals (e.g., NASA), it can also handle words with
- "unusual" capitalization (e.g., "ITCorp" or "TeX"). If a
- word is capitalized incorrectly, the list of possibilities
- will include all acceptable capitalizations. (More than
-
-
-
- local 10
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- one capitalization may be acceptable; for example, my dic-
- tionary lists both "ITCorp" and "ITcorp".)
-
- Normally, this feature will not cause you surprises, but
- there is one circumstance you need to be aware of. If you
- use "I" to add a word to your dictionary that is at the
- beginning of a sentence (e.g., the first word of this
- paragraph if "normally" were not in the dictionary), it
- will be marked as "capitalization required". A subsequent
- usage of this word without capitalization (e.g., the
- quoted word in the previous sentence) will be considered a
- misspelling by _i_s_p_e_l_l, and it will suggest the capitalized
- version. You must then compare the actual spellings by
- eye, and then type "I" to add the uncapitalized variant to
- your personal dictionary. You can avoid this problem by
- using "U" to add the original word, rather than "I".
-
- The rules for capitalization are as follows:
-
- (1) Any word may appear in all capitals, as in head-
- ings.
-
- (2) Any word that is in the dictionary in all-lowercase
- form may appear either in lowercase or capitalized
- (as at the beginning of a sentence).
-
- (3) Any word that has "funny" capitalization (i.e., it
- contains both cases and there is an uppercase char-
- acter besides the first) must appear exactly as in
- the dictionary, except as permitted by rule (1).
- If the word is acceptable in all-lowercase, it must
- appear thus in a dictionary entry.
-
- bbuuiillddhhaasshh
- The _b_u_i_l_d_h_a_s_h program builds hashed dictionary files for
- later use by _i_s_p_e_l_l_. The raw word list (with affix flags)
- is given in _d_i_c_t_-_f_i_l_e, and the the affix flags are defined
- by _a_f_f_i_x_-_f_i_l_e. The hashed output is written to _h_a_s_h_-_f_i_l_e.
- The formats of the two input files are described in
- _i_s_p_e_l_l(4). The --ss (silent) option suppresses the usual
- status messages that are written to the standard error
- device.
-
- mmuunncchhlliisstt
- The _m_u_n_c_h_l_i_s_t shell script is used to reduce the size of
- dictionary files, primarily personal dictionary files. It
- is also capable of combining dictionaries from various
- sources. The given _f_i_l_e_s are read (standard input if no
- arguments are given), reduced to a minimal set of roots
- and affixes that will match the same list of words, and
- written to standard output.
-
- Input for munchlist contains of raw words (e.g from your
- personal dictionary files) or root and affix combinations
-
-
-
- local 11
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- (probably generated in earlier munchlist runs). Each word
- or root/affix combination must be on a separate line.
-
- The --DD (debug) option leaves temporary files around under
- standard names instead of deleting them, so that the
- script can be debugged. Warning: this option can eat up
- an enormous amount of temporary file space.
-
- The --vv (verbose) option causes progress messages to be
- reported to stderr so you won't get nervous that _m_u_n_c_h_l_i_s_t
- has hung.
-
- If the --ss (strip) option is specified, words that are in
- the specified _h_a_s_h_-_f_i_l_e are removed from the word list.
- This can be useful with personal dictionaries.
-
- The --ll option can be used to specify an alternate _a_f_f_i_x_-
- _f_i_l_e for munching dictionaries in languages other than
- English.
-
- The --cc option can be used to convert dictionaries that
- were built with an older affix file, without risk of acci-
- dentally introducing unintended affix combinations into
- the dictionary.
-
- The --TT option allows dictionaries to be converted to a
- canonical string-character format. The suffix specified
- is looked up in the affix file (--ll switch) to determine
- the string-character format used for the input file; the
- output always uses the canonical string-character format.
- For example, a dictionary collected from TeX source files
- might be converted to canonical format by specifying --TT
- tteexx.
-
- The --ww option is passed on to _i_s_p_e_l_l.
-
- ffiinnddaaffffiixx
- The _f_i_n_d_a_f_f_i_x shell script is an aid to writers of new
- language descriptions in choosing affixes. The given dic-
- tionary _f_i_l_e_s (standard input if none are given) are exam-
- ined for possible prefixes (--pp switch) or suffixes (--ss
- switch, the default). Each commonly-occurring affix is
- presented along with a count of the number of times it
- appears and an estimate of the number of bytes that would
- be saved in a dictionary hash file if it were added to the
- language table. Only affixes that generate legal roots
- (found in the original input) are listed.
-
- If the "-c" option is not given, the output lines are in
- the following format:
-
- strip/add/count/bytes
-
- where _s_t_r_i_p is the string that should be stripped from a
-
-
-
- local 12
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- root word before adding the affix, _a_d_d is the affix to be
- added, _c_o_u_n_t is a count of the number of times that this
- _s_t_r_i_p/_a_d_d combination appears, and _b_y_t_e_s is an estimate of
- the number of bytes that might be saved in the raw dictio-
- nary file if this combination is added to the affix file.
- The field separator in the output will be the tab charac-
- ter specified by the --tt switch; the default is a slash
- ("/").
-
- If the --cc ("clean output") option is given, the appearance
- of the output is made visually cleaner (but harder to
- post-process) by changing it to:
-
- -strip+add<tab>count<tab>bytes
-
- where _s_t_r_i_p, _a_d_d, _c_o_u_n_t, and _b_y_t_e_s are as before, and
- _<_t_a_b_> represents the ASCII tab character.
-
- The method used to generate possible affixes will also
- generate longer affixes which have common headers or
- trailers. For example, the two words "moth" and "mother"
- will generate not only the obvious substitution "+er" but
- also "-h+her" and "-th+ther" (and possibly even longer
- ones, depending on the value of _m_i_n). To prevent clutter-
- ing the output with such affixes, any affix pair that
- shares a common header (or, for prefixes, trailer) string
- longer than _e_l_i_m characters (default 1) will be sup-
- pressed. You may want to set "elim" to a value greater
- than 1 if your language has string characters; usually the
- need for this parameter will become obvious when you exam-
- ine the output of your _f_i_n_d_a_f_f_i_x run.
-
- Normally, the affixes are sorted according to the estimate
- of bytes saved. The --ff switch may be used to cause the
- affixes to be sorted by frequency of appearance.
-
- To save output file space, affixes which occur fewer than
- 10 times are eliminated; this limit may be changed with
- the --ll switch. The --MM switch specifies a maximum affix
- length (default 8). Affixes longer than this will not be
- reported. (This saves on temporary disk space and makes
- the script run faster.)
-
- Affixes which generate stems shorter than 3 characters are
- suppressed. (A stem is the word after the _s_t_r_i_p string
- has been removed, and before the _a_d_d string has been
- added.) This reduces both the running time and the size
- of the output file. This limit may be changed with the --mm
- switch. The minimum stem length should only be set to 1
- if you have a _l_o_t of free time and disk space (in the
- range of many days and hundreds of megabytes).
-
- The _f_i_n_d_a_f_f_i_x script requires a non-blank field-separator
- character for internal use. Normally, this character is a
-
-
-
- local 13
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- slash ("/"), but if the slash appears as a character in
- the input word list, a different character can be speci-
- fied with the --tt switch.
-
- Ispell dictionaries should be expanded before being fed to
- _f_i_n_d_a_f_f_i_x; in addition, characters that are not in the
- English alphabet (if any) should be translated to lower-
- case.
-
- ttrryyaaffffiixx
- The _t_r_y_a_f_f_i_x shell script is used to estimate the effec-
- tiveness of a proposed prefix (--pp switch) or suffix (--ss
- switch, the default) with a given _e_x_p_a_n_d_e_d_-_f_i_l_e. Only one
- affix can be tried with each execution of _t_r_y_a_f_f_i_x,
- although multiple arguments can be used to describe vary-
- ing forms of the same affix flag (e.g., the DD flag for
- English can add either _D or _E_D depending on whether a
- trailing E is already present). Each word in the expanded
- dictionary that ends (or begins) with the chosen suffix
- (or prefix) has that suffix (prefix) removed; the dictio-
- nary is then searched for root words that match the
- stripped word. Normally, all matching roots are written
- to standard output, but if the --cc (count) flag is given,
- only a statistical summary of the results is written. The
- statistics given are a count of words the affix poten-
- tially applies to and an estimate of the number of dictio-
- nary bytes that a flag using the affix would save. The
- estimate will be high if the flag generates words that are
- currently generated by other affix flags (e.g., in
- English, _b_a_t_h_e_r_s can be generated by either _b_a_t_h_/_X or
- _b_a_t_h_e_r_/_S).
-
- The dictionary file, _e_x_p_a_n_d_e_d_-_f_i_l_e, must already be
- expanded (using the --ee switch of _i_s_p_e_l_l) and sorted, and
- things will usually work best if uppercase has been folded
- to lower with 'tr'.
-
- The _a_f_f_i_x arguments are things to be stripped from the
- dictionary file to produce trial roots: for English, _c_o_n
- (prefix) and _i_n_g (suffix) are examples. The _a_d_d_i_t_i_o_n
- parts of the argument are letters that would have been
- stripped off the root before adding the affix. For exam-
- ple, in English the affix _i_n_g normally strips _e for words
- ending in that letter (e.g., _l_i_k_e becomes _l_i_k_i_n_g) so we
- might run:
-
- tryaffix ing ing+e
-
- to cover both cases.
-
- All of the shell scripts contain documentation as commen-
- tary at the beginning; sometimes these comments contain
- useful information beyond the scope of this manual page.
-
-
-
-
- local 14
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- It is possible to install _i_s_p_e_l_l in such a way as to only
- support ASCII range text if desired.
-
- iiccoommbbiinnee
- The _i_c_o_m_b_i_n_e program is a helper for _m_u_n_c_h_l_i_s_t. It reads
- a list of words in dictionary format (roots plus flags)
- from the standard input, and produces a reduced list on
- standard output which combines common roots found on adja-
- cent entries. Identical roots which have differing flags
- will have their flags combined, and roots which have dif-
- fering capitalizations will be combined in a way which
- only preserves important capitalization information. The
- optional _a_f_f_-_f_i_l_e specifies a language file which defines
- the character sets used and the meanings of the various
- flags. The --TT switch can be used to select among alterna-
- tive string character types by giving a dummy suffix that
- can be found in an aallttssttrriinnggttyyppee statement.
-
- iijjooiinn
- The _i_j_o_i_n program is a re-implementation of _j_o_i_n(1) which
- handles long lines and 8-bit characters correctly. The --ss
- switch specifies that the _s_o_r_t(1) program used to prepare
- the input to _i_j_o_i_n uses signed comparisons on 8-bit char-
- acters; the --uu switch specifies that _s_o_r_t(1) uses unsigned
- comparisons. All other options and behaviors of _j_o_i_n(1)
- are duplicated as exactly as possible based on the manual
- page, except that _i_j_o_i_n will not handle newline as a field
- separator. See the _j_o_i_n(1) manual page for more informa-
- tion.
-
- EENNVVIIRROONNMMEENNTT
- DICTIONARY
- Default dictionary to use, if no --dd flag is given.
-
- WORDLIST
- Personal dictionary file name
-
- INCLUDE_STRING
- Code for file inclusion under the --AA option
-
- TMPDIR Directory used for some of munchlist's temporary
- files
-
- FFIILLEESS
- /usr/local/lib/english.hash
- Hashed dictionary (may be found in some other local
- directory, depending on the system).
-
- /usr/local/lib/english.aff
- Affix-definition file for _m_u_n_c_h_l_i_s_t
-
- /usr/dict/web2 or /usr/dict/words
- For the Lookup function (depending on the WORDS
- compilation option).
-
-
-
- local 15
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- $HOME/.ispell__h_a_s_h_f_i_l_e
- User's private dictionary
-
- .ispell__h_a_s_h_f_i_l_e
- Directory-specific private dictionary
-
- SSEEEE AALLSSOO
- _s_p_e_l_l(1), _e_g_r_e_p(1), _l_o_o_k(1), _j_o_i_n(1), _s_o_r_t(1), _s_q(1L),
- _t_i_b(1L), _i_s_p_e_l_l(4L), _e_n_g_l_i_s_h(4L)
-
- BBUUGGSS
- It takes several to many seconds for _i_s_p_e_l_l to read in the
- hash table, depending on size.
-
- When all options are enabled, _i_s_p_e_l_l may take several sec-
- onds to generate all the guesses at corrections for a mis-
- spelled word; on slower machines this time is long enough
- to be annoying.
-
- The hash table is stored as a quarter-megabyte (or larger)
- array, so a PDP-11 or 286 version does not seem likely.
-
- _I_s_p_e_l_l should understand more _t_r_o_f_f syntax, and deal more
- intelligently with contractions.
-
- Although small personal dictionaries are sorted before
- they are written out, the order of capitalizations of the
- same word is somewhat random.
-
- When the --xx flag is specified, _i_s_p_e_l_l will unlink any
- existing .bak file.
-
- There are too many flags, and many of them have non-
- mnemonic names.
-
- _M_u_n_c_h_l_i_s_t does not deal very gracefully with dictionaries
- which contain "non-word" characters. Such characters
- ought to be deleted from the dictionary with a warning
- message.
-
- _F_i_n_d_a_f_f_i_x and _m_u_n_c_h_l_i_s_t require tremendous amounts of tem-
- porary file space for large dictionaries. They do respect
- the TMPDIR environment variable, so this space can be
- redirected. However, a lot of the temporary space needed
- is for sorting, so TMPDIR is only a partial help on sys-
- tems with an uncooperative _s_o_r_t(1). ("Cooperative" is
- defined as accepting the undocumented -T switch). At its
- peak usage, _m_u_n_c_h_l_i_s_t takes 10 to 40 times the original
- dictionary's size in Kb. (The larger ratio is for dictio-
- naries that already have heavy affix use, such as the one
- distributed with _i_s_p_e_l_l). _M_u_n_c_h_l_i_s_t is also very slow;
- munching a normal-sized dictionary (15K roots, 45K
- expanded words) takes around an hour on a small worksta-
- tion. (Most of this time is spent in _s_o_r_t(1), and
-
-
-
- local 16
-
-
-
-
-
- ISPELL(1) ISPELL(1)
-
-
- _m_u_n_c_h_l_i_s_t can run much faster on machines that have a more
- modern _s_o_r_t that makes better use of the memory available
- to it.) _F_i_n_d_a_f_f_i_x is even worse; the smallest English
- dictionary cannot be processed with this script in a mere
- 50Kb of free space, and even after specifying switches to
- reduce the temporary space required, the script will run
- for over 24 hours on a small workstation.
-
- AAUUTTHHOORR
- Pace Willisson (pace@mit-vax), 1983, based on the PDP-10
- assembly version. That version was written by R. E. Gorin
- in 1971, and later revised by W. E. Matson (1974) and W.
- B. Ackerman (1978).
-
- Collected, revised, and enhanced for the Usenet by Walt
- Buehring, 1987.
-
- Table-driven multi-lingual version by Geoff Kuenning,
- 1987-88.
-
- Large dictionaries provided by Bob Devine (vianet!devine).
-
- A complete list of contributors is too large to list here,
- but is distributed with the ispell sources in the file
- "Contributors".
-
- VVEERRSSIIOONN
- The version of ispell described by this manual page is
- International Ispell Version 3.1.00, 10/08/93.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- local 17
-
-
-